The search functionality is under construction.

Keyword Search Result

[Keyword] neural network(855hit)

121-140hit(855hit)

  • Multi-Task Learning for Improved Recognition of Multiple Types of Acoustic Information

    Jae-Won KIM  Hochong PARK  

     
    LETTER-Speech and Hearing

      Pubricized:
    2021/07/14
      Vol:
    E104-D No:10
      Page(s):
    1762-1765

    We propose a new method for improving the recognition performance of phonemes, speech emotions, and music genres using multi-task learning. When tasks are closely related, multi-task learning can improve the performance of each task by learning common feature representation for all the tasks. However, the recognition tasks considered in this study demand different input signals of speech and music at different time scales, resulting in input features with different characteristics. In addition, a training dataset with multiple labels for all information sources is not available. Considering these issues, we conduct multi-task learning in a sequential training process using input features with a single label for one information source. A comparative evaluation confirms that the proposed method for multi-task learning provides higher performance for all recognition tasks than individual learning for each task as in conventional methods.

  • Conditional Wasserstein Generative Adversarial Networks for Rebalancing Iris Image Datasets

    Yung-Hui LI  Muhammad Saqlain ASLAM  Latifa Nabila HARFIYA  Ching-Chun CHANG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/06/01
      Vol:
    E104-D No:9
      Page(s):
    1450-1458

    The recent development of deep learning-based generative models has sharply intensified the interest in data synthesis and its applications. Data synthesis takes on an added importance especially for some pattern recognition tasks in which some classes of data are rare and difficult to collect. In an iris dataset, for instance, the minority class samples include images of eyes with glasses, oversized or undersized pupils, misaligned iris locations, and iris occluded or contaminated by eyelids, eyelashes, or lighting reflections. Such class-imbalanced datasets often result in biased classification performance. Generative adversarial networks (GANs) are one of the most promising frameworks that learn to generate synthetic data through a two-player minimax game between a generator and a discriminator. In this paper, we utilized the state-of-the-art conditional Wasserstein generative adversarial network with gradient penalty (CWGAN-GP) for generating the minority class of iris images which saves huge amount of cost of human labors for rare data collection. With our model, the researcher can generate as many iris images of rare cases as they want and it helps to develop any deep learning algorithm whenever large size of dataset is needed.

  • Gated Convolutional Neural Networks with Sentence-Related Selection for Distantly Supervised Relation Extraction

    Yufeng CHEN  Siqi LI  Xingya LI  Jinan XU  Jian LIU  

     
    PAPER-Natural Language Processing

      Pubricized:
    2021/06/01
      Vol:
    E104-D No:9
      Page(s):
    1486-1495

    Relation extraction is one of the key basic tasks in natural language processing in which distant supervision is widely used for obtaining large-scale labeled data without expensive labor cost. However, the automatically generated data contains massive noise because of the wrong labeling problem in distant supervision. To address this problem, the existing research work mainly focuses on removing sentence-level noise with various sentence selection strategies, which however could be incompetent for disposing word-level noise. In this paper, we propose a novel neural framework considering both intra-sentence and inter-sentence relevance to deal with word-level and sentence-level noise from distant supervision, which is denoted as Sentence-Related Gated Piecewise Convolutional Neural Networks (SR-GPCNN). Specifically, 1) a gate mechanism with multi-head self-attention is adopted to reduce word-level noise inside sentences; 2) a soft-label strategy is utilized to alleviate wrong-labeling propagation problem; and 3) a sentence-related selection model is designed to filter sentence-level noise further. The extensive experimental results on NYT dataset demonstrate that our approach filters word-level and sentence-level noise effectively, thus significantly outperforms all the baseline models in terms of both AUC and top-n precision metrics.

  • A ΔΣ-Modulation Feedforward Network for Non-Binary Analog-to-Digital Converters

    Takao WAHO  Tomoaki KOIZUMI  Hitoshi HAYASHI  

     
    PAPER-Circuit Technologies

      Pubricized:
    2021/05/24
      Vol:
    E104-D No:8
      Page(s):
    1130-1137

    A feedforward (FF) network using ΔΣ modulators is investigated to implement a non-binary analog-to-digital (A/D) converter. Weighting coefficients in the network are determined to suppress the generation of quantization noise. A moving average is adopted to prevent the analog signal amplitude from increasing beyond the allowable input range of the modulators. The noise transfer function is derived and used to estimate the signal-to-noise ratio (SNR). The FF network output is a non-uniformly distributed multi-level signal, which results in a better SNR than a uniformly distributed one. Also, the effect of the characteristic mismatch in analog components on the SNR is analyzed. Our behavioral simulations show that the SNR is improved by more than 30 dB, or equivalently a bit resolution of 5 bits, compared with a conventional first-order ΔΣ modulator.

  • Classification Functions for Handwritten Digit Recognition

    Tsutomu SASAO  Yuto HORIKAWA  Yukihiro IGUCHI  

     
    PAPER-Logic Design

      Pubricized:
    2021/04/01
      Vol:
    E104-D No:8
      Page(s):
    1076-1082

    A classification function maps a set of vectors into several classes. A machine learning problem is treated as a design problem for partially defined classification functions. To realize classification functions for MNIST hand written digits, three different architectures are considered: Single-unit realization, 45-unit realization, and 45-unit ×r realization. The 45-unit realization consists of 45 ternary classifiers, 10 counters, and a max selector. Test accuracy of these architectures are compared using MNIST data set.

  • CJAM: Convolutional Neural Network Joint Attention Mechanism in Gait Recognition

    Pengtao JIA  Qi ZHAO  Boze LI  Jing ZHANG  

     
    PAPER

      Pubricized:
    2021/04/28
      Vol:
    E104-D No:8
      Page(s):
    1239-1249

    Gait recognition distinguishes one individual from others according to the natural patterns of human gaits. Gait recognition is a challenging signal processing technology for biometric identification due to the ambiguity of contours and the complex feature extraction procedure. In this work, we proposed a new model - the convolutional neural network (CNN) joint attention mechanism (CJAM) - to classify the gait sequences and conduct person identification using the CASIA-A and CASIA-B gait datasets. The CNN model has the ability to extract gait features, and the attention mechanism continuously focuses on the most discriminative area to achieve person identification. We present a comprehensive transformation from gait image preprocessing to final identification. The results from 12 experiments show that the new attention model leads to a lower error rate than others. The CJAM model improved the 3D-CNN, CNN-LSTM (long short-term memory), and the simple CNN by 8.44%, 2.94% and 1.45%, respectively.

  • Improvement of CT Reconstruction Using Scattered X-Rays

    Shota ITO  Naohiro TODA  

     
    PAPER-Biological Engineering

      Pubricized:
    2021/05/06
      Vol:
    E104-D No:8
      Page(s):
    1378-1385

    A neural network that outputs reconstructed images based on projection data containing scattered X-rays is presented, and the proposed scheme exhibits better accuracy than conventional computed tomography (CT), in which the scatter information is removed. In medical X-ray CT, it is a common practice to remove scattered X-rays using a collimator placed in front of the detector. In this study, the scattered X-rays were assumed to have useful information, and a method was devised to utilize this information effectively using a neural network. Therefore, we generated 70,000 projection data by Monte Carlo simulations using a cube comprising 216 (6 × 6 × 6) smaller cubes having random density parameters as the target object. For each projection simulation, the densities of the smaller cubes were reset to different values, and detectors were deployed around the target object to capture the scattered X-rays from all directions. Then, a neural network was trained using these projection data to output the densities of the smaller cubes. We confirmed through numerical evaluations that the neural-network approach that utilized scattered X-rays reconstructed images with higher accuracy than did the conventional method, in which the scattered X-rays were removed. The results of this study suggest that utilizing the scattered X-ray information can help significantly reduce patient dosing during imaging.

  • Low-Power Implementation Techniques for Convolutional Neural Networks Using Precise and Active Skipping Methods Open Access

    Akira KITAYAMA  Goichi ONO  Tadashi KISHIMOTO  Hiroaki ITO  Naohiro KOHMU  

     
    PAPER

      Pubricized:
    2020/12/22
      Vol:
    E104-C No:7
      Page(s):
    330-337

    Reducing power consumption is crucial for edge devices using convolutional neural network (CNN). The zero-skipping approach for CNNs is a processing technique widely known for its relatively low power consumption and high speed. This approach stops multiplication and accumulation (MAC) when the multiplication results of the input data and weight are zero. However, this technique requires large logic circuits with around 5% overhead, and the average rate of MAC stopping is approximately 30%. In this paper, we propose a precise zero-skipping method that uses input data and simple logic circuits to stop multipliers and accumulators precisely. We also propose an active data-skipping method to further reduce power consumption by slightly degrading recognition accuracy. In this method, each multiplier and accumulator are stopped by using small values (e.g., 1, 2) as input. We implemented single shot multi-box detector 500 (SSD500) network model on a Xilinx ZU9 and applied our proposed techniques. We verified that operations were stopped at a rate of 49.1%, recognition accuracy was degraded by 0.29%, power consumption was reduced from 9.2 to 4.4 W (-52.3%), and circuit overhead was reduced from 5.1 to 2.7% (-45.9%). The proposed techniques were determined to be effective for lowering the power consumption of CNN-based edge devices such as FPGA.

  • Maritime Target Detection Based on Electronic Image Stabilization Technology of Shipborne Camera

    Xiongfei SHAN  Mingyang PAN  Depeng ZHAO  Deqiang WANG  Feng-Jang HWANG  Chi-Hua CHEN  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/04/02
      Vol:
    E104-D No:7
      Page(s):
    948-960

    During the detection of maritime targets, the jitter of the shipborne camera usually causes the video instability and the false or missed detection of targets. Aimed at tackling this problem, a novel algorithm for maritime target detection based on the electronic image stabilization technology is proposed in this study. The algorithm mainly includes three models, namely the points line model (PLM), the points classification model (PCM), and the image classification model (ICM). The feature points (FPs) are firstly classified by the PLM, and stable videos as well as target contours are obtained by the PCM. Then the smallest bounding rectangles of the target contours generated as the candidate bounding boxes (bboxes) are sent to the ICM for classification. In the experiments, the ICM, which is constructed based on the convolutional neural network (CNN), is trained and its effectiveness is verified. Our experimental results demonstrate that the proposed algorithm outperformed the benchmark models in all the common metrics including the mean square error (MSE), peak signal to noise ratio (PSNR), structural similarity index (SSIM), and mean average precision (mAP) by at least -47.87%, 8.66%, 6.94%, and 5.75%, respectively. The proposed algorithm is superior to the state-of-the-art techniques in both the image stabilization and target ship detection, which provides reliable technical support for the visual development of unmanned ships.

  • Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials Open Access

    Takaaki SAEKI  Yuki SAITO  Shinnosuke TAKAMICHI  Hiroshi SARUWATARI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/04/16
      Vol:
    E104-D No:7
      Page(s):
    1002-1016

    This paper proposes two high-fidelity and computationally efficient neural voice conversion (VC) methods based on a direct waveform modification using spectral differentials. The conventional spectral-differential VC method with a minimum-phase filter achieves high-quality conversion for narrow-band (16 kHz-sampled) VC but requires heavy computational cost in filtering. This is because the minimum phase obtained using a fixed lifter of the Hilbert transform often results in a long-tap filter. Furthermore, when we extend the method to full-band (48 kHz-sampled) VC, the computational cost is heavy due to increased sampling points, and the converted-speech quality degrades due to large fluctuations in the high-frequency band. To construct a short-tap filter, we propose a lifter-training method for data-driven phase reconstruction that trains a lifter of the Hilbert transform by taking into account filter truncation. We also propose a frequency-band-wise modeling method based on sub-band multi-rate signal processing (sub-band modeling method) for full-band VC. It enhances the computational efficiency by reducing sampling points of signals converted with filtering and improves converted-speech quality by modeling only the low-frequency band. We conducted several objective and subjective evaluations to investigate the effectiveness of the proposed methods through implementation of the real-time, online, full-band VC system we developed, which is based on the proposed methods. The results indicate that 1) the proposed lifter-training method for narrow-band VC can shorten the tap length to 1/16 without degrading the converted-speech quality, and 2) the proposed sub-band modeling method for full-band VC can improve the converted-speech quality while reducing the computational cost, and 3) our real-time, online, full-band VC system can convert 48 kHz-sampled speech in real time attaining the converted speech with a 3.6 out of 5.0 mean opinion score of naturalness.

  • SLIT: An Energy-Efficient Reconfigurable Hardware Architecture for Deep Convolutional Neural Networks Open Access

    Thi Diem TRAN  Yasuhiko NAKASHIMA  

     
    PAPER

      Pubricized:
    2020/12/18
      Vol:
    E104-C No:7
      Page(s):
    319-329

    Convolutional neural networks (CNNs) have dominated a range of applications, from advanced manufacturing to autonomous cars. For energy cost-efficiency, developing low-power hardware for CNNs is a research trend. Due to the large input size, the first few convolutional layers generally consume most latency and hardware resources on hardware design. To address these challenges, this paper proposes an innovative architecture named SLIT to extract feature maps and reconstruct the first few layers on CNNs. In this reconstruction approach, total multiply-accumulate operations are eliminated on the first layers. We evaluate new topology with MNIST, CIFAR, SVHN, and ImageNet datasets on image classification application. Latency and hardware resources of the inference step are evaluated on the chip ZC7Z020-1CLG484C FPGA with Lenet-5 and VGG schemes. On the Lenet-5 scheme, our architecture reduces 39% of latency and 70% of hardware resources with a 0.456 W power consumption compared to previous works. Even though the VGG models perform with a 10% reduction in hardware resources and latency, we hope our overall results will potentially give a new impetus for future studies to reach a higher optimization on hardware design. Notably, the SLIT architecture efficiently merges with most popular CNNs at a slightly sacrificing accuracy of a factor of 0.27% on MNIST, ranging from 0.5% to 1.5% on CIFAR, approximately 2.2% on ImageNet, and remaining the same on SVHN databases.

  • Low-Complexity Training for Binary Convolutional Neural Networks Based on Clipping-Aware Weight Update

    Changho RYU  Tae-Hwan KIM  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2021/03/17
      Vol:
    E104-D No:6
      Page(s):
    919-922

    This letter presents an efficient technique to reduce the computational complexity involved in training binary convolutional neural networks (BCNN). The BCNN training shall be conducted focusing on the optimization of the sign of each weight element rather than the exact value itself in convention; in which, the sign of an element is not likely to be flipped anymore after it has been updated to have such a large magnitude to be clipped out. The proposed technique does not update such elements that have been clipped out and eliminates the computations involved in their optimization accordingly. The complexity reduction by the proposed technique is as high as 25.52% in training the BCNN model for the CIFAR-10 classification task, while the accuracy is maintained without severe degradation.

  • Building Change Detection by Using Past Map Information and Optical Aerial Images

    Motohiro TAKAGI  Kazuya HAYASE  Masaki KITAHARA  Jun SHIMAMURA  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/03/23
      Vol:
    E104-D No:6
      Page(s):
    897-900

    This paper proposes a change detection method for buildings based on convolutional neural networks. The proposed method detects building changes from pairs of optical aerial images and past map information concerning buildings. Using high-resolution image pair and past map information seamlessly, the proposed method can capture the building areas more precisely compared to a conventional method. Our experimental results show that the proposed method outperforms the conventional change detection method that uses optical aerial images to detect building changes.

  • Automatically Generated Data Mining Tools for Complex System Operator's Condition Detection Using Non-Contact Vital Sensing Open Access

    Shakhnaz AKHMEDOVA  Vladimir STANOVOV  Sophia VISHNEVSKAYA  Chiori MIYAJIMA  Yukihiro KAMIYA  

     
    INVITED PAPER-Navigation, Guidance and Control Systems

      Pubricized:
    2020/12/24
      Vol:
    E104-B No:6
      Page(s):
    571-579

    This study is focused on the automated detection of a complex system operator's condition. For example, in this study a person's reaction while listening to music (or not listening at all) was determined. For this purpose various well-known data mining tools as well as ones developed by authors were used. To be more specific, the following techniques were developed and applied for the mentioned problems: artificial neural networks and fuzzy rule-based classifiers. The neural networks were generated by two modifications of the Differential Evolution algorithm based on the NSGA and MOEA/D schemes, proposed for solving multi-objective optimization problems. Fuzzy logic systems were generated by the population-based algorithm called Co-Operation of Biology Related Algorithms or COBRA. However, firstly each person's state was monitored. Thus, databases for problems described in this study were obtained by using non-contact Doppler sensors. Experimental results demonstrated that automatically generated neural networks and fuzzy rule-based classifiers can properly determine the human condition and reaction. Besides, proposed approaches outperformed alternative data mining tools. However, it was established that fuzzy rule-based classifiers are more accurate and interpretable than neural networks. Thus, they can be used for solving more complex problems related to the automated detection of an operator's condition.

  • A Partial Matching Convolution Neural Network for Source Retrieval of Plagiarism Detection

    Leilei KONG  Yong HAN  Haoliang QI  Zhongyuan HAN  

     
    LETTER-Natural Language Processing

      Pubricized:
    2021/03/03
      Vol:
    E104-D No:6
      Page(s):
    915-918

    Source retrieval is the primary task of plagiarism detection. It searches the documents that may be the sources of plagiarism to a suspicious document. The state-of-the-art approaches usually rely on the classical information retrieval models, such as the probability model or vector space model, to get the plagiarism sources. However, the goal of source retrieval is to obtain the source documents that contain the plagiarism parts of the suspicious document, rather than to rank the documents relevant to the whole suspicious document. To model the “partial matching” between documents, this paper proposes a Partial Matching Convolution Neural Network (PMCNN) for source retrieval. In detail, PMCNN exploits a sequential convolution neural network to extract the plagiarism patterns of contiguous text segments. The experimental results on PAN 2013 and PAN 2014 plagiarism source retrieval corpus show that PMCNN boosts the performance of source retrieval significantly, outperforming other state-of-the-art document models.

  • Occurrence Prediction of Dislocation Regions in Photoluminescence Image of Multicrystalline Silicon Wafers Using Transfer Learning of Convolutional Neural Network Open Access

    Hiroaki KUDO  Tetsuya MATSUMOTO  Kentaro KUTSUKAKE  Noritaka USAMI  

     
    PAPER

      Pubricized:
    2020/12/08
      Vol:
    E104-A No:6
      Page(s):
    857-865

    In this paper, we evaluate a prediction method of regions including dislocation clusters which are crystallographic defects in a photoluminescence (PL) image of multicrystalline silicon wafers. We applied a method of a transfer learning of the convolutional neural network to solve this task. For an input of a sub-region image of a whole PL image, the network outputs the dislocation cluster regions are included in the upper wafer image or not. A network learned using image in lower wafers of the bottom of dislocation clusters as positive examples. We experimented under three conditions as negative examples; image of some depth wafer, randomly selected images, and both images. We examined performances of accuracies and Youden's J statistics under 2 cases; predictions of occurrences of dislocation clusters at 10 upper wafer or 20 upper wafer. Results present that values of accuracies and values of Youden's J are not so high, but they are higher results than ones of bag of features (visual words) method. For our purpose to find occurrences dislocation clusters in upper wafers from the input wafer, we obtained results that randomly select condition as negative examples is appropriate for 10 upper wafers prediction, since its results are better than other negative examples conditions, consistently.

  • Efficient Hardware Accelerator for Compressed Sparse Deep Neural Network

    Hao XIAO  Kaikai ZHAO  Guangzhu LIU  

     
    LETTER-Computer System

      Pubricized:
    2021/02/19
      Vol:
    E104-D No:5
      Page(s):
    772-775

    This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99×, 1.95× faster and 20.38×, 3.04× more energy efficient than CPU and mGPU platforms, respectively, running AlexNet.

  • Malicious URLs Detection Based on a Novel Optimization Algorithm

    Wang BO  Zhang B. FANG  Liu X. WEI  Zou F. CHENG  Zhang X. HUA  

     
    LETTER-Information Network

      Pubricized:
    2021/01/14
      Vol:
    E104-D No:4
      Page(s):
    513-516

    In this paper, the issue of malicious URL detection is investigated. Firstly a P system is proposed. Then the new P system is introduced to design the optimization algorithm of BP neural network to achieve the malicious URL detection with better performance. In the end some examples are included and corresponding experimental results display the advantage and effectiveness of the optimization algorithm proposed.

  • A Hardware Implementation on Customizable Embedded DSP Core for Colorectal Tumor Classification with Endoscopic Video toward Real-Time Computer-Aided Diagnosais System

    Masayuki ODAGAWA  Takumi OKAMOTO  Tetsushi KOIDE  Toru TAMAKI  Bisser RAYTCHEV  Kazufumi KANEDA  Shigeto YOSHIDA  Hiroshi MIENO  Shinji TANAKA  Takayuki SUGAWARA  Hiroshi TOISHI  Masayuki TSUJI  Nobuo TAMBA  

     
    PAPER-VLSI Design Technology and CAD

      Pubricized:
    2020/10/06
      Vol:
    E104-A No:4
      Page(s):
    691-701

    In this paper, we present a hardware implementation of a colorectal cancer diagnosis support system using a colorectal endoscopic video image on customizable embedded DSP. In an endoscopic video image, color shift, blurring or reflection of light occurs in a lesion area, which affects the discrimination result by a computer. Therefore, in order to identify lesions with high robustness and stable classification to these images specific to video frame, we implement a computer-aided diagnosis (CAD) system for colorectal endoscopic images with Narrow Band Imaging (NBI) magnification with the Convolutional Neural Network (CNN) feature and Support Vector Machine (SVM) classification. Since CNN and SVM need to perform many multiplication and accumulation (MAC) operations, we implement the proposed hardware system on a customizable embedded DSP, which can realize at high speed MAC operations and parallel processing with Very Long Instruction Word (VLIW). Before implementing to the customizable embedded DSP, we profile and analyze processing cycles of the CAD system and optimize the bottlenecks. We show the effectiveness of the real-time diagnosis support system on the embedded system for endoscopic video images. The prototyped system demonstrated real-time processing on video frame rate (over 30fps @ 200MHz) and more than 90% accuracy.

  • Encrypted Traffic Identification by Fusing Softmax Classifier with Its Angular Margin Variant

    Lin YAN  Mingyong ZENG  Shuai REN  Zhangkai LUO  

     
    LETTER-Information Network

      Pubricized:
    2021/01/13
      Vol:
    E104-D No:4
      Page(s):
    517-520

    Encrypted traffic identification is to predict traffic types of encrypted traffic. A deep residual convolution network is proposed for this task. The Softmax classifier is fused with its angular variant, which sets an angular margin to achieve better discrimination. The proposed method improves representation learning and reaches excellent results on the public dataset.

121-140hit(855hit)